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ABSTRACT 



o 

H 

lyj ' We present a preliminary analysis of the sensitivity of Anglo- Australian Planet 

. Search data to the orbital parameters of extrasolar planets. To do so, we have devel- 

oped new tools for the automatic analysis of large-scale simulations of Doppler velocity 
planet search data. One of these tools is the 2-Dimcnsional Kcplcrian Lomb-Scargle 
^ . periodogram, that enables the straightforward detection of exoplancts with high ec- 

' centricities (something the standard Lomb-Scargle periodogram routinely fails to do) . 

00 \ We used this technique to re-determine the orbital parameters of HD 20782b, with one 

' of the highest known exoplanet eccentricities (e = 0.97 ± 0.01). We also derive a set 

of detection criteria that do not depend on the distribution functions of fitted Keple- 
rian orbital parameters (which we show are non-Gaussian with pronounced, extended 
wings). Using these tools, we examine the selection functions in orbital period, eccen- 
OQ I tricity and planet mass of Anglo- Australian Planet Search data for three planets with 

. large-scale Monte Carlo-like simulations. We find that the detectability of exoplanets 

declines at high eccentricities. However, we also find that exoplanet detectability is a 
strong function of epoch-to-epoch data quality, number of observations, and period 
^ . sampling. This strongly suggests that simple parametrisations of the detectability of 

$H ' exoplanets based on "whole-of-survey" metrics may not be accurate. We have derived 

5^ , empirical relationships between the uncertainty estimates for orbital parameters that 

are derived from least-squares Keplerian fits to our simulations, and the true 99% 
limits for the errors in those parameters, which are larger than equivalent Gaussian 
limits by factors of 5-10. We quantify the rate at which false positives are made by our 
detection criteria, and find that they do not significantly affect our final conclusions. 
And finally, we find that there is a bias against measuring near-zero eccentricities, 
which becomes more significant in small, or low signal-to-noise-ratio, data sets. 

Key words: stars: planetary systems - methods: statistical - methods: numerical - 
stars: individual: HD20782, HD38382, HD179949 



1 INTRODUCTION 

Extra-solar planet detection using the Doppler method has 
played a dominant role in placing this field at the heart of 
astronomical research. The advances made in this field have 
been primarily due to the significantly increased stability 
of high-resolution spectrographs, and improved techniques 
for calibrating residual spectrograph variations. An excel- 
lent example of this can be seen in the Anglo- Australian 



Planet Search (A APS), where a long-term velocity preci- 
sion of 3ms~^ ove r the initial ~ 8 years of observation 
(|Tinnev et al.ll2005l ). ha s been improved in recent years to 
better than 2 ms" 1 (e.g. lO'Toole et al.ll2007l '). 

With the time baselines of Doppler surveys now ap- 
proaching (or exceeding) a decade, and routine velocity pre- 
cisions approaching 1 ms~^, we are now in a position to ask, 
and answer, critical questions about the underlying distribu- 
tions of exoplanet parameters. Questions such as, how com- 
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mon are gas-giant planets in Jupiter-like orbits (i.e. ~5AU 
near circular orbits)? How common are gas-giant planets in 
Earth-like orbits, likely to host habitable terrestrial satel- 
lites (i.e. ~1AU near circular orbits)? And, how common 
are low-mass planets in close orbits (i.e. ~<0.3AU with 

Msinz < lOMEarth)? 

To answer these questions, we must first characterise 
and quantify the selection effects that are present in our 
observations. It is clear that the Doppler velocity planet 
searches have inhe rent observational biases. For example, 
iGaudi et al.1 l|2005h have noted that there is a sharp cut-off 
at periods of ~ 3 days for planets detected in RV surveys, 
while transit surveys have found the majority of their plan- 
ets at periods below this. We need to know, in detail, what 
orbital parameter space Doppler surveys probe, and how 
their sensitivity varies over that orbital parameter space. 

Analysis of Doppler survey selection effects have, to 
date, been forced to ma. k e a v ariety of simplifying as- 
sumptions. iNaravan et al.l (|2005l ). for example, examined 
the detectability of short-period, close-in planets, and so 
were able to ignore the complicating effects of eccentric- 
ity. They derived an empirical relation for detectability as 
a f unction of the number o f observations and data qual- 
ity. IWittenmver et all (|2006l ) investigated the detectability 
of exoplanets in their data using simulations at e = and 
0.6, enabling them to determine that there are selection ef- 
fects at high eccentricity, but not to quantify them across 
the full range of eccentricities which exoplanets have been 
fo und to disp l ay. T he effects of eccentricity were considered 
bv lCummind ()2004l ). who derived empirical relationships for 
velocity thresholds relying on an _F-statistic for two different 
cases: when the orbital period is shorter than the time-span 
of the observations, and when it is longer. More r ecent ly, 
(in an expansion of wo rk begun in ICumming erall l|l999l) ). 
ICumming et al.l l|2008h used this technique to derive detec- 
tion thresholds, determine selection effects and measure the 
incompleteness of Keck Planet Search data, in order to inves- 
tigate the exoplanetary minimum mass and orbital period 
distributions present in that data. However, the analytical 
method used in these studies makes a number of simplify- 
ing assumptions: that individual velocity uncertainties can 
be represented with a Gaussian distribution; that observa- 
tions are evenly spaced; and that the number of independent 
periods probed by a data set can be quantified in a meaning- 
ful way (Marcy et al. 2005). Unfortunately, real observations 
violate all three of these assumptions. 

In this paper, therefore, we lay the groundwork for an 
investigation of the full range of physically interesting ex- 
oplanet parameters that Doppler data can probe (period 
P, eccentricity e and minimum planet mass Msini) using 
star-by-star, epoch-by-epoch Monte Carlo simulations, in an 
effort to understand what our Doppler data are telling us 
about the orbital parameters of exoplanets, while making 
as few assumptions as possible. We introduce a set of auto- 
mated planet detection criteria and combine it with large- 
scale simulations of Keplerian orbits for each star observed, 
to determine the sensitivity of our "as observed" data. 

1.1 Observations, sampling and data quality 

Objec ts in the AAPS ca t alogu e are listed in Ijones et al.l 
\2Q0% and iTinnev et al.l l|2003l '). Details of our observing 



progr am are described in more detail elsewhere l|Butler et al.l 
I2OOIII . Briefly, the data are taken using the University Col- 
lege London Echelle Spectrograph mounted at the coude 
focus of the Anglo- Australian Telescope (AAT). An iodine 
absorption cell is placed in the beam, imprinting a forest of 
molecular iodine absorption lines onto the stellar spectrum. 
These lines are used as a wavelengt h reference to derive high- 
precision velocities as described in iButler et all (119961 ). 

The target stars of the AAPS (in common with most 
Doppler search targets) are observed in a non-uniform way. 
First, observing runs are scheduled in blocks spread un- 
evenly throughout a semester, which necessarily needs to 
non-ideal (ie. non-logarithmic) period sampling. Second, the 
weather during each block of observations affects the time- 
sampling of data as well, with velocity precisions generally 
being poorer in poor weather conditions, and with bright 
objects tending to be generally observed more often, when 
conditions are poor. (Note that in this paper we use the me- 
dian measurement uncertainty of a given set of observations 
as an indicator of the data quality.) 

Finally, large amounts of data tend to be acquired for 
objects where a planet is thought to exist, and smaller 
amounts of data are acquired for stars where planets are 
thought unlikely (or where a possible planet has period 
longer than the current data string). As a result "high pri- 
ority" targets get observed more densely. As a result of all 
these effects, time-sampling can deviate markedly from the 
ideal of uniform logarithmic sampling in period space. 

After each observing block is completed, the data are 
processed to update a database of velocities. This is analysed 
periodically to look for objects showing significant variabil- 
ity or periodicity. These get promoted to the "higher pri- 
ority" status described above. This prioritisation analysis 
has been done to date using a Lomb-Scargle periodogram, 
Keplerian fitting based on the most significant periods, the 
determination of False Alarm Probabilities (|Marcv et al.l 
[20031) ■ and the application of simple tests asking "Have we 
seen at least one period?" and "Do subsequent data ob- 
tained from a high priority object match the prediction 
from initial fits?" This prioritisation maximises the rate at 
which exoplanets can be extracted from our data, but means 
that our survey database (like those of most Doppler planet 
searches) is quite non-uniform. Simulation of the as-observed 
data sets for all stars in our database is therefore the only 
way to quantify the non-uniform selection effects inherent 
in Doppler velocity planet searches. 



2 THE 2D KEPLERIAN LOMB-SCARGLE 
PERIODOGRAM 

A tool commonly used for detecting variability in light and 
radial v elocit y curves is the Lomb-Scargle (LS) periodogram 
iLombl [l976l : IScarglel 1 19821 ). It involves determining, as a 
function of frequency, the difference between the of a 
sinusoid fit to data and the of a- constant fit (with the 
the resulting "power" being normalised in some way) . There 
are well developed statistics surrounding LS power, allowing 
significa nce values to be attributed to possible detections 
(see e.g. ICummin"3 12004| . for a discussion). In most cases 
where the LS periodogram is applied (e.g. in most areas of 
stellar pulsation and close binarity) the signal under study 
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is approximately sinusoidal, and so the LS periodogram is 
applied appropriately. 

The LS periodogram is now increasingly being used 
in Doppler velocity planet searches where, however, circu- 
lar orbits (giving rise to sin usoidal velocity curves) are not 
common l|Butler et al.ll2006h . It therefore makes more sense 
to fit Kepleri a ns to data instead of sinusoids as discussed 
by ICummind l|2004l ). As orbital eccentricity is an impor- 
tant parameter in a Keplerian function, we have expanded 
the traditional LS periodogram to include two dimensions 
- that is, to examine power as a function of both period 
and eccentricity. We call this the 2D Keplerian LS (2DKLS) 
periodogram. The method we use to calculate the 2DKLS 
periodogram was briefly discussed in lO'Toole et all (|2007l ) 
and is described in more detail below. 
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2.1 Method 

The 2DKLS is an extension of the traditional Lomb-Scargle 
periodogram, where we vary period and eccentricity in the 
calculation of power. (While the argument of pericenter, u, 
is also important in determining the shape of the velocity 
curve, it does not impact the orbital period measurement in 
the same way as eccentricity.) This is also an extension of 
the one dime nsional Keplerian Lomb-Scargle periodogram 
introduced by I Cummin We find, however, that not 

fixing eccentricity, while more efficient computationally, al- 
lows for possible non-physical values (i.e. outside the range 
0-1). The "smoothness" of the periodogram also depends a 
lot more on the initial guesses for the free parameters. We 
use a grid of fixed periods and eccentricities to calculate the 
2DKLS, with e = — 0.98 in steps of 0.01, and periods on 
a logarithmic scale from 1 day up to the maximum possi- 
ble period of interest for that data sequence (in most cases 
4500 days for current AAPS data), on a spacing of 10~^ in 
logj^Q P. A Keplerian described by Equation [T] is then fitted 
to the data using a non-linear least squares fit ting routine 
with Levenberg-Marquardt minimisation from iPress et al.l 
(1 19861). 



HD2078a 



Vr{t) = K[cos{uj + v{t)) + ecosu\ + Vo 



(1) 



Here K is the semi-amplitude, v(t) is the true anomaly 
involving implicit dependence on the orbital period P and 
the time of periastron passage Tp, and Vo is the velocity 
zero-point. The true anomaly is derived by solving Kepler's 
equation M{t) = E{t) — esin E{t) where E(t) is the ec- 
centric anomaly and M{t) — 2Tit/P is the mean anomaly. 
The power, z{P,e), is determined using z{P,e) — = 
(Xmcan - Xkop)/4, where xLp is the goodness-of-fit for the 
best fit Keplerian model, and Xmoan is the equivalent for a 
constant fit to the data. For each value of P and e we find 
the values of the remaining parameters that min imise Xk^p 
and t herefore maximise z{P,e). As discussed by ICummind 
(|2004l ). when the noise level is not known in advance (i.e. 
for observations) z{P, e) must be normalised, in the 2DKLS 
case by Xkp.t^- This f orm of the 2DKLS was implemented by 
lO'Toole et al] (|2007 ') and is used in the next section. For the 
purposes of the simulations in this paper (described in Sec- 
tion [3|, the power is not normalised, because the noise level 
is an input parameter and is therefore known in advance. 

The 2DKLS has several advantages over the traditional 
LS periodogram. First, it is sensitive to high-eccentricity 
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Figure 1. (a) The 2D Keplerian Lomb-Scargle periodogram for 
HD20782. Dark areas indicate significant power. The dashed lines 
indicate the positions of the slices, (b) Three slices through the 
2DKLS at the fit eccentricity {top panel), e = 0.80 {middle panel) 
and e = 0.0 {bottom panel). The signal at the fit eccentricity is 
very strong, barely detectable at e = 0.80, and undetectable at 
e = 0.0. 



planets, which the traditional LS periodogram is not (as we 
show with an example in Section [2.2p . Second because the 
Keplerian functional form fitted by the 2DKLS is a better 
representation of real orbits, the peak in the 2DKLS power 
is higher than the traditional LS power. Third, the width of 
the 2DKLS power peak more accurately indicates the level 
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at which the eccentricity is determined by a given Doppler 
data set than the cross-terms in a single non-hnear least 
squares "best" Keplerian fit. (Real eccentricity uncertain- 
ties are invariably much larger than the least-squares cross- 
term uncertainties.) In addition, the 2DKLS can be used in a 
simil ar manner to that of the CLEAN algorithm (jHogboml 
1 19741 ). if a simultaneous fit of multiple Keplerians is also 
done. But perhaps most importantly for our purposes, the 
2DKLS allows for a simpler automation of the planet detec- 
tion process, as it much more rapidly narrows a Keplerian 
trial fit on the "correct" best estimate of period and eccen- 
tricity. 



2.2 Application to HD 20782 

The 2DKLS periodogram aids significan tly in the de t ection 
of high-eccentricity planets. As noted by I Jones et al.l ()2006l ) 
in their paper on the extremely eccentric planet, HD 20782b, 
detecting high eccentricity planets using traditional peri- 
odogram methods is extremely difficult. With the 2DKLS, 
however the detection of a planet like HD 20782b becomes 
straightforward. The 2DKLS periodogram for all AAPS data 
on HD 20782 up to 2007, October 1 is shown in Figure [Ha), 
along with slices through the 2DKLS at e = 0.97 (the best-fit 
Keplerian eccentricity), e = 0.8 and e = 0.0 in Figure [l] The 
planetary orbital signal is obvious at e=0.97, but the power 
peak becomes progressively smaller at lower eccentricities; 
it is already difficult to discern at e = 0.8, and (most crit- 
ically) completely undetectable at e = 0.0. In other words, 
this is a planet which an automated traditional LS "first- 
pass" analysis woul d never detec t. 

We update the IJones et al.l parameters for this planet 
using the most recent data in Table [1] and show the revised 
fit in Figure (2] The planet is in a highly eccentric orbit, 
however, further refinement of the orbit with confirmation 
of additional observations near periastron (i.e. near the large 
velocity excursion) are important. The last excursion for the 
e = 0.97 solution occurred in the window 2008 June 18-21, 
when (unfortunately) HD 20782 was inaccessibly behind the 
Sun, so further refinement of the orbit will have to wait until 
the beginning of 2010. Measured velocities are given in Ta- 
ble [2] To highlight the importance of constraining eccentric 
planets in the narrow windows when their velocities change 
most rapidly, we also show in Table [T] the results of the fit 
excluding the most extreme velocity point. The significant 
change in the best-fit orbital eccentricity (0.97 to 0.57) that 
occurs as a result of removing just one data point highlights 
the difficulties encountered in detecting and characterising 
eccentric planets. 



3 SIMULATIONS 

The goal of this work is to derive the underlying distribu- 
tions of the orbital parameters (period, eccentricity and min- 
imum mass) as revealed by our AAPS observations, so as 
to allow meaningful comparison with planet formation and 
evolution models. Each object in our catalogue has a dif- 
ferent brightness, has different intrinsic velocity variability, 
and has been observed at different epochs with varying see- 
ing and transparency conditions. The only way, therefore, to 
understand the selection functions inherent to our data set 



Table 1. Updated Orbital Parameters of HD 20782b 



Parameter 



All obs. 



without 
extreme pt. 



P(d) 


591.9±2.8 


577.9±2.6 


K (ms-i) 


185.3±49.7 


21.7±1.2 


e 


0.97±0.01 


0.57±0.04 


LO 


147.7±6.5° 


98.6±5.7° 


To (JD-2451000) 


83.8±8.2 


175.9±8.4 


Msini (Mjup) 


1.9±0.5 


0.73±0.05 


a (AU) 


1.381±0.005 


1.359±0.005 


No6» 


36 


35 


RMS (ms-i) 


5.6 


4.8 




2.34 


1.90 


jitter* (ms"-"^) 


2.21 


2.21 



* Stellar jitter is calculated using the updated prescription of J. 
Wright (2008, private communication). 
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Figure 2. Keplerian fit for HD 20782, along with the residuals 
after subtraction of the best-fit model. 



is to simulate it on a star-by-star and epoch-by-epoch basis. 
We have therefore begun a major program of Monte Carlo 
simulations. 

The time-stamps for the AAPS observations of our 
target stars were used to create artificial data sets for 
single planets (modelled as a single Keplerian using 
Equation [T] with an input period, eccentricity and planet 
mass). The input periods on a logarithmic grid from 
log J Q Pi — 0.0 to 3.6 in steps of 0.3 (or 1 to 3981 days); 
input eccentricities are on a grid from 0.0 to 0.9 in 
steps of 0.1; and planet masses are on a grid with M — 
(0.02, 0.05, 0.1, 0.2, 0.5, 1.0, 1.6, 2.3, 3.0, 4.0, 6.0, 9.0, 13.0, 
20.0) in units of M]. The semi-amplitudes for each artificial 
data set are derived using the following: 



Mi sin i 

7^ 



Pi{M, + A/iSini) 



1 -1/3 



27rG 



(2) 



where M* is the mass of the host star. Mi is the planet's 
mass, i is the (unknown) inclination of the system and G is 
the gravitational constant. The subscript i denotes the input 
parameter. Measured parameters will be denoted with a sub- 



Table 2. Velocities for HD 20782 with corresponding measure- 
ment uncertainties. 



JD 






RV 


(-2451000) 


(m s 




35.319456 


17.7 


1 

± 


2.3 


236.930648 


— 10.7 


1 

± 


3.3 


527.017315 


■3 9 


1 

it 


3.4 


630.882407 


25.5 


1 

± 


2.7 


768.308854 


1 n Q 


1 

it 


2.6 


828.110660 


— 11.8 


1 

± 


3.0 


829.274491 


— 10.8 


1 

± 


3.8 


829.996250 


—30.6 


1 

± 


8.7 


856.135301 


— 14.5 


it 


3.6 


919.006597 


—7.8 


1 

± 


2.9 


919.996296 


—5.8 


1 

± 


2.9 


983.890093 


0.000 


1 

± 


3.3 


1092.304375 


1 Q 7 


1 

it 


2.3 


1127.268137 


13.5 


1 

± 


2.8 


1152.163079 


inn 

ly.u 


1 

it 


2.5 


1187.159653 


20.7 


1 

± 


2.5 


1511.206498 


—6.5 


1 

± 


2.3 


1592.048162 


13.1 


1 

± 


2.3 


1654.960313 


11.4 


1 


2.2 


1859.305274 


-206.2 


± 


1.9 


1946.138453 


-20.6 


± 


2.0 


1947.122481 


-17.3 


± 


1.6 




-4.2 


± 


1 ^ 

-L .O 


2044.023669 


-3.3 


± 


2.2 


2045.960788 


-5.8 


± 


1.9 


2217.288060 


4.7 


± 


1.6 


2282.220295 


18.1 


± 


1.9 


2398.969109 


17.5 


± 


1.3 


2403.960670 


25.5 


± 


2.5 


2576.306902 


-12.7 


± 


1.5 


2632.281289 


-11.7 


± 


1.6 


2665.186505 


1.7 


± 


1.7 


3013.216410 


28.3 


± 


1.5 


3040.131498 


20.8 


± 


1.9 


3153.970057 


-14.5 


± 


2.1 


3375.246543 


10.2 


± 


1.6 



script m. Stellar isochrone masses from IValenti fc Fischerl 
l|2005l ) are used to estimate M« for each host star. 

At each epoch, the ideal Keplerian has noise added - 
which we model at present as being Gaussian, with a width 
given by the internal measurement uncertainty produced by 
that epoch's Doppler analysis. It is known that the mea- 
surement uncertainties themselves do not follow a Gaussian 
distribution, for a variety of reasons. A more realistic model 
for stellar noise in our simulations is currently planned. This 
will incorp orate stellar noise sources such a s magnetic ac- 
tivity (e.g. IWrigh^l2005l l. stellar oscillations (jO'Toole et all 
I2OO8I ) and stellar convection, and systematic measurement 
effects. 

The 2DKLS necessarily involves an increase in compu- 
tation load compared to the LS, meaning that parallelisation 
of code is vital. Each simulation takes anywhere between 20 
seconds and 10 minutes of CPU time per processor depend- 
ing on the number of data points. We have used the MPICH 
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Table 3. Properties of our target stars. AT is the time-span of 
the observations. 



Star 


V 


Spec 


N 


AT 


median 




(mag) 


Type 




(days) 


unc. (ms^-"-) 


HD 20782* 


7.36 


G3V 


35 


3119 


2.27 


HD 38382 


6.34 


F8V 


17 


2452 


3.80 


HD 179949* 


6.25 


F8V 


56 


2626 


5.26 



* Indicates planet host star 



implement at iorQ of the Message Passing Interface library to 
run our simulation analysis system in parallel. The analysis 
of our early simulations were run on a small Linux clus- 
ter comprising 10 processors at the Anglo- Australian Ob- 
servatory, as well as some of the 224 processors available 
through the Miracle facility at University College, Londonlf] 
Subsequent analyses were moved to the Swinburne Centre 
for Astrophysics and Supercomputin^ in July 2007 utilising 
around 160 processors per star. 

One hundred simulations have been constructed for 
each set of parameters (P, e, K), leading to 182 000 sim- 
ulations for each target object. In this paper we focus on 
resuhs for three stars: HD 20782, HD 179949 and HD 38382, 
whose relevant properties are shown in Table [3] The first 
two objects each h ave a known planet l|jones et al.l l2006l : 
iTinnev et al]|200ll ). However, to examine also the effects of 
sampling and number of observations, we also consider a 
further two subsets of the HD 179949 data - one using ev- 
ery second observation, and the other every fourth - and 
simulated those as well. 



3.1 Distributions of fitted Keplerian parameters 

In a study of exoplanet parameter uncertainties, iFordI (|2005l ) 
found that their distribution is typically non-Gaussian. In 
many ways this is unsurprising, considering the correlations 
existing between parameters, and as the description of Ke- 
plerian motion given by Equation [T] is highly non- linear. 

We can attempt to understand the uncertainty distri- 
butions of the Keplerian orbital parameters that are pro- 
duced in a least-squares fit to an observed Doppler data 
set, by using our simulations to look at the distributions of 
the orbital parameter errorfl (i.e. the differences between 
the input "i" and measured "m" simulation values). Fig- 
ure [3] shows the distribution of the errors in eccentricity 
for HD 179949, with log Pi fixed at 2.4 and e, fixed at 0.4 
(i.e. including eccentricity errors at all planet masses). The 
dashed line is a Gaussian fit to the histogram, while the solid 
line is a Lorentzian fit. The wings of the distribution diverge 
significantly from a Gaussian and are better matched by the 
Lorentzian. Note that we are not claiming the distribution 

^ http:/ /www-unix. mcs.anl.gov/mpi/mpichl/index. htm 

http:/ /www. ucl.ac.uk/silva/research-computing/ 
^ http: / / astronomy.swin.edu.au/supercomputing/ 
^ We distinguish here between the error of a measurement (i.e. by 
how much it is wrong, which can only be known when one knows 
the "right answer" as in these simulations) and its uncertainty 
(i.e. an estimate of the quality of a measurement in the absence 
of knowing the "right answer" ) . 
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logP,= Z.'l, e,= 0.4 



logP.= 2.7^ K,= 37.5 
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Figure 3. Distribution of em-ei for HD 179949 with log Pi = 2.4 
and Ci = 0.4. The dashed curve is a Gaussian fitted to the data, 
while the solid curve is a Lorentzian fit. The 99% confidence limit 
is 7.7 times larger than the equivalent 2.58(j limit (were Gaussian 
statistics to apply). 



Figure 5. Similar to Figure |3] except for Km — Ki with logP^ = 
2.7 and = 37.5ms-l. The 99% confidence limit is 10.4 times 
larger than the equivalent 2.58cr limit (were Gaussian statistics 
to apply). 
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Figure 4. Similar to Figure[3]except for Pm — Pi with log Pi = 0.9 
and ei = 0.0. The 99% confidence limit is 6.5 times larger than 
the equivalent 2.58a- limit (were Gaussian statistics to apply). 



is Lorentzian, simply that the extended wings of this func- 
tion is a better model of the extended wings of the observed 
distribution. Similarly, the distribution of measured periods 
and semi-amplitudes, shown in Figures |4]and[5]respectively, 
are non-Gaussian, with extended Lorentzian-like wings. The 
special case of ei = is not only non-Gaussian, but also non- 
sy mmetric. A n obvious consequence of this and the analysis 
of lFordI (|2005l ') is that Gaussian statistics for the orbital pa- 
rameter estimates (e.g. Scr or 5(7 limits) cannot be used as 
criteria for exoplanet detection. 

To demonstrate this, we show in Figures [3][5] both the 
99% limits of these error distributions and the data ranges 
that would correspond to such a limit were Gaussian statis- 
tics to apply (i.e. 2.58cr). In every case the observed 99% 
confidence limits are much larger by factors of ~5-10. Thus 
the real uncertainty on a Keplerian fit parameter is signif- 
icantly larger than that one would predict based on Gaus- 
sian statistics alone, and Gaussian statistics must be either 



avoided, or suitably modified, in any set of exoplanet de- 
tection criteria. We discuss a set of criteria that take this 
effect into account in Sectional and in particular, we exam- 
ine empirical relationships which can be used to calibrate 
and derive robust confidence limits for the orbital elements 
produced by least-squares Keplerian fits in Section [4.31 



4 DETECTION CRITERIA 

One of the important practical considerations of our simu- 
lation system is that it must be automated. We therefore 
require a set of criteria to decide whether a planet has been 
detected, without any human intervention. These will be ap- 
plied to the results of Keplerian fits to simulated data that 
should be both robust and not add significantly to the time 
budget of our analyses. 

When determining adequate criteria, there are two im- 
portant differences between the analysis of real and simu- 
lated data that must be considered. First, there is a dif- 
ference between simply trying to determine whether one of 
250 target stars has a planet, and whether one of millions 
of data sets has a real planet or not. In the former case, as 
much time can be spent as needed on trying to decide "by 
eye" whether it is real or not. For latter case automation is 
essential. 

Second, the aim of these simulations is not the same as 
that for planet discovery. In the latter, the bias is towards 
seeing whether a planet has been found orbiting a target 
star, with subsequent observations being used to confirm or 
deny its status. For a simulated detection there are no subse- 
quent observations - one has to decide the status using only 
the simulated data. Moreover, there is a simulated planet 
present in every data set. What we need to know whether 
it has been detected with sufficient robustness that we can 
be sure (within a given confidence level), that it is a real 
detection. As such the simulated detection criteria will al- 
most always be more stringent than the criteria used for the 
discovery of an exoplanet from actual planet search data. 

Considered another way, these simulations are aiming 
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to generalise the same process that is used in estimating 
the l/14iax volume that is represented by each star going 
into the estimation of a luminosity function. In this case we 
are estimating for each star Knax in the (P, e, K) phase 
space that is accessible to a given set of data. This means 
that a selection of a set of detection criteria that determines 
Knax is arbitrary - it determines the sensitivity, but not 
the results, of our survey. Have too loose a set of detection 
criteria, and you find lots of objects, and have a large Vmax, 
but are subject to false positives. Have too tight a set of 
detection criteria, and you find few objects and have a small 
Knax, but are much less subject to the biases associated with 
false positives. 



4.1 Previous sets of detection criteria 

There are several methods that have previous ly been used 
to det ermine the reality of a planet detection. iMarcv et al.l 
presented an exceUent discussion on two different 
approaches to the False Alarm Probability or FAP. The 
FAF is the probability that the best-fit model Keplerian 
could have arisen simply as a result of noise fluctuations . 
The first involves the classical F-test (jBevingtonl Il969l ') 
which has several weaknesses: it assumes that the uncer- 
tainties of the measurements follow a Gaussian distribution 
- even the smallest deviation from normality has been re- 
ported to be extremely non-robust (jLindmanlll974h : it can- 
not properly account for the actual u neven temporal sam- 
pling of the observations (e.g. iMarcv et al. 2005); and it de- 
pends on the number of independent freq uencies - a num - 
ber which can only b e appro ximated. Both lCummind (|2004l ) 
and ICumming et al.l (|2008l ) used the F-teat to investigate 
the detectability of planets in Doppler surveys based on the 
analytical FAPs. 

The other method presented bv lMarcv et al.l ([2005) is 
empirical and involves creating 1000 or more quasi-artificial 
data sets by generating randomly scrambling the velocities, 
but keeping the times the same, and then analysing the new 
sets in the same way as the original data. The number of 
xi values similar to the value for the candidate detection is 
then used to construct a FAF. This approach has the ad- 
vantage that the distribution of uncertainties and temporal 
sampling of the observations are unimportant. Used alone it 
cannot distinguish between peaks with similar significance 
in a power spectrum of the actual observations, as these are 
likely to have similar FAFs. 

Because of the large number of simulations we plan 
to carry out, it is important we have a simple set of cri- 
teria that can quickly test the reliability of a detection. 
This automatically rules out several approaches that are 
in themselve s computationally intensive; in particular the 
iMarcv et al.l scrambling method to determine a FAF de- 
scribed above would add considerably to the ti me budget 
of our simulation analysis. The _F-test used by ICummin3 
l|2004l ) is also inappropriate for two reasons: first, AAFS 
data has uneven temporal sampling, and second, because 
we have incorporated our velocity measurement uncertain- 
ties - which do not follow a Gaussian distribution - into the 
our noise estimates, the simulated velocities show small de- 
partures from a pure Gaussian. We have therefore developed 
our own set of criteria, discussed below. 



4.2 Our criteria 

There are two final points to consider before we present our 
criteria. First, the criteria we use must be "blind" - that is, 
they must only be based on measured quantities, and have 
no reliance on the input orbital parameters of the simula- 
tions. Second, the number of false positives should be as low 
as possible. The following are by no means the only criteria 
that we could use, however, as we show in Section [4.41 they 
produce an acceptable fraction of false positives. The set of 
criteria we have found to be useful are: 

RMS(sim) > RMS (res) (i) 

The RMS of the simulated observations must be greater than 
or equal to the RMS of the residuals of the best-fit model. 
That is, by subtracting a Keplerian model from the time 
series, the overall scatter should decrease, rather than in- 
crease. 

K,^>2SK + RMS{res) (ii) 

The measured semi-amplitude must be greater than or equal 
to twice the semi- amplitude uncertainty, plus the RMS of 
the residuals of the best fit model. The uncertainties from 
the fit procedure we have used are correlated - the off-axis 
terms in the covariance matrix calculated in the non-linear 
least squares fit are non-zero - which means that the fit 
uncertainties are lower limits. We approximate the semi- 
amplitude uncertainties here as twice the fit uncertainty plus 
the RMS of the residuals of the best-fit model. This criterion 
rejects poorly constrained semi-amplitudes (and therefore 
poorly-constrained planet masses). 

Pm > 2SP (ifi) 

The measured period must be greater than or equal to twice 
the period uncertainty. As with the semi-amplitude uncer- 
tainties in criterion (|n]) above, the period uncertainties are 
underestimated. We double the fit uncertainty to reject 
poorly constrained periods. 

xl < 3 (iv) 

The xt value must be less than or equal to 3. This is a some- 
what arbitrary cut, however it significantly reduces the num- 
ber of false positives at high eccentricities, as discussed in 
Section 1331 

As a simple test of our criteria we have used them to 
check whether each of the published planets in the AAFS 
target catalogue would be found. As part of this test we 
include a stellar jitter term in our fits, despite not includ- 
ing it in our simulations. This is because jitter is present 
in the real observations (and contributes to the individual 
measurement uncertainties of those real observations), while 
it is not present in our simulated observations, which have 
only had Gaussian noise added to them. By including the 
appropriate jitter values in our analyses (J. Wright, 2008 
private communication), we find that 15 of the known plan- 
ets satisfy our criteria. Excluding the Xv cut, all bar one of 
the planets pass the test. This bares no reflection of the re- 
ality of the planets, but rather a reflection of the strictness 
of our criteria. Other effects at play here are the presence 
of multiple companions, and accuracy of the jitter measure- 
ments used, which is only around ± 50% (J. Wright, private 
communication; note that the jitter contains unquantified 
time- variability) . 
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Figure 6. Plot of 99% limits in errors (i.e. difference between in- 
put and measured orbital parameters) versus median uncertainty 
in the parameter from least-squares Keplerian fits to the simu- 
lated data, for each orbital parameter for HD 20782. The dashed 
line shows a power law with the parameters listed in Table 141 



Table 4. Power law exponents for each star. The power laws 
have the form W^Xl^, for X = 5Pm,5em and a + /3logX for 

X=5Kr„,. 



Parameter 


HD 20782 


HD 38382 


HD 179949 


ap 


1.41±0.03 


1.67±0.02 


1.50±0.02 


I3p 


l.OOiO.Ol 


1.07±0.01 


1.02±0.01 


OtK 


56.6±1.3 


90.0±6.0 


75.6±4.3 


Pk 


60.3±4.9 


134.5±15.6 


155.0±10.2 




1.46±0.13 


1.82±0.26 


2.10±0.26 


Pe 


1.12±0.06 


1.33±0.15 


1.41±0.13 



Doppler data sets) and our observed error distributions. To 
examine these relationships we compare the uncertainties 
from the least-squares Keplerian fit and the 99% confidence 
range from the simulations. 

In Figure [6] we show the 99% confidence limit as a func- 
tion of the median uncertainty of the fit SPm (top), SKm 
(middle) and Sem (bottom) for HD 20782 at at fixed pairs 
of Pi and Ci or Pi and Ki (as in Section FS.ip . There is clearly 
a correlation between these parameters, which we have char- 
acterised in the figure with a power-law for each parameter. 
The power laws have the form: 99% confidence = 10"X'', 
where X = SPm, Scm and 99% confidence = a -I- /?log SKm 
for semi-amplitude. The parameters a and /3 from these fits 
for HD 20782 are listed in Table |4] (along with the equiv- 
alent parameters for similar fits to the equivalent data for 
HD 38382 and HD 179949). 

For the period data, it is clear that the power-law slope 
is consistent with an index of 1 - that is, the 99% confi- 
dence limits are linearly related to the la fit uncertainties 
by factors of 26-47, or equivalently Gaussian statistics over- 
estimate the 99% limits for a period determination from 
these simulated Doppler data by factors of between 10-18. 
For eccentricity the correlation is weaker, and the power-law 
fit indicated is slightly above 1. More importantly, the 99% 
confidence limits are about 10-50 times larger than those 
which would be derived from simply trusting the Gaussian 
nature of the uncertainties coming from a least-squares Ke- 
plerian fit. This reflects the fact that eccentricity is, in gen- 
eral, very poorly constrained by Doppler data sets, as was 
seen in our analysis of the 2DKLS. The correlation between 
the observed 99% confldence limits and the Keplerian fit 
uncertainties for semi- amplitude K is poor, and again the 
99% confidence limits in K are larger than those one would 
predict from the Keplerian fit uncertainties. 

From our analysis of these three data sets, it would 
not appear that there are any general conclusions that can 
be reached, for every star and every data set, on how to 
relate real 99% confidence limits to Keplerian fit uncertain- 
ties (other than that Keplerian fit uncertainties significantly 
under-estimate - by factors of greater than 10 - the real 
confidence limits) . However, it is clear that for a given sim- 
ulated data set, that meaningful correlations can be derived 
and applied. We have therefore used these relationships to 
convert fit uncertainties into meaningful confidence limits 
for our subsequent analysis of false positives. 



4.3 Orbital parameter confidence limits 

We have seen in the discussion of distribution functions in 
Section FS.ll that Gaussian statistics cannot be used to model 
the distribution of errors in orbital parameters that arise 
from least-squares Keplerian fits to our simulated data (and 
obviously they similarly cannot be used to model the uncer- 
tainties in orbital parameters for detected Doppler planets 
from real data sets either) . 

The orbital parameter error distribution functions do 
contain information that is useful, however, in that they 
allow us to empirically calibrate the relationship between 
the uncertainties that arise from the covariance matrix in a 
least-squares Keplerian fit (i.e. the source of the traditional 
uncertainties in orbital parameters produced in analysing 



4.4 False positives 

It is important that the number of false positives (i.e. the 
number of incorrect detections) triggered by our detection 
criteria be (a) quantifiable, and (b) as small as possible. We 
adopt as our "incorrectness" criterion that the measured or- 
bital parameter differs from the input orbital parameter by 
more than the 99% confidence limit for that orbital param- 
eter (as derived in Section [4. 3p . 

For each simulation that results in a detection, we test 
for "correctness" by asking; 

(i) is the error in period (i.e. the difference between mea- 
sured period and input period) less than the 99% confidence 
limit (as derived from calibrating the least-squares fit period 
uncertainty to a true confidence limit as described in I4.3|l ? 
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Figure 7. False positives (open squares) for HD 20782 as a func- 
tion of simulated input eccentricity. The filled squares represent 
the corresponding false positives excluding the xi detection cri- 
terion (see section 14.4.11 in the text). Without this xi cut, the 
simulations would be subject to an unacceptable level of false 
positives at high eccentricities. 



If the error is larger than the 99% hmit, we call the period 
incorrect. We also ask, 

(ii) is the error in the the semi-amplitude larger than the 
99% confidence limit? If it is then we call the semi-amplitude 
incorrect. 

If both the period and the semi-amplitude are deter- 
mined to be incorrect (at the 99% level), we describe this as 
an incorrect detection, or false positive. The false positive 
rate is then the ratio of the number of incorrect detections 
to the total number of detections. Averaging over all pa- 
rameters, we find the false positive rate due to incorrect 
period and semi-amplitude to be 1.2% for HD 179949, 2.2% 
for HD 20782, and 9.0% for HD 38382. In the sections that 
follow, we look in more detail at these numbers and at their 
trends as a function of input orbital parameters. 

4-4- 1 The !E: 3 Detection Criterion 

We are now in a position to demonstrate our reasons for 
include this particular criterion, which we do in Figure [T] 
which shows the false positive fraction as a function of sim- 
ulated input eccentricity both with this criterion applied 
(open squares) and not applied (filled squares). It immedi- 
ately becomes apparent that without this particular crite- 
rion being applied, our data set is subject to an unacceptably 
large fraction of false positives at high eccentricity. Even 
with this criterion being applied the number of false posi- 
tives shows an increase over the "base" level of 1-2% seen 
at low eccentricity, up to 6% at e=0.9. (Similar patterns are 
seen for the other sets of simulations for the other stars.) 

4.4-2 Eccentricity 

We show in the top panel of Figure [8] the rate of false posi- 
tives for each object as a function of input eccentricity, at all 
values of Pi and Mi. For HD 179949 and HD 20782, the per- 
centage of false positives remains at ~ 1% up to Ci ~ 0.6 and 
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Figure 8. Top panel: False positives for HD 179949 (diamonds), 
HD 38382 (triangles) and HD 20782 (squares) as a function of 
input eccentricity. Bottom panel: False positives of each of the 
HD 179949 subsets - the full set of 56 epochs (triangles), the 28 
epoch subset (crosses) and the 14 epoch subset (circles) - demon- 
strating the significant increase in false positive detections at low 
observation density. A dotted line is shown at the 1% false posi- 
tive level. 



then increases to 3-6% at Ci = 0.9. In the case of HD 38382, 
the false positive rate is around 7% up to Ci ~ 0.5, increasing 
to 18% at Ci = 0.9. The higher false positive rate for this 
star is due to its having much fewer observations (just 17 
epochs) than the other two stars - fewer observations make 
it harder to detected an exoplanet, and conversely makes 
that data set more subject to false positive detections. To 
demonstrate this, we show in the bottom panel of the same 
figure the false positive rate for the three HD 179949 subsets. 
The 28 epoch subset (crosses) has 2-3 times as many false 
positives as the full HD 179949 data set (with 56 epochs; dia- 
monds) and the 14 epoch subset matches the HD 38382 false 
positive rate at low eccentricities and then becomes worse 
as eccentricity increases. The star- by-star approach in this 
case reproduces the expected behaviour - more observations 
give more confidence in an exoplanetary detection. 



4.4.3 Period 

The false positive rates for our three stars are shown as a 
function of Pi (at all values of Ci and Mi) in Figure [Q] At 
periods longer than the time-span of the observations and 
below ~ 4 d, the rate of false positives is 4-9% for HD 179949 
and HD 20782. As with eccentricity, the false positive rate is 
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Figure 9. False positives of the measured period values for 
HD 179949, HD 38382 and HD 20782 as a function of input pe- 
riod. Symbols have the same meaning as Figure |8] 
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Figure 10. False positives of the measured semi-amplitude values 
for HD 179949, HD 20782 and HD 38382 as a function of input 
planet mass. Symbols have the same meaning as Figure |8] 



below ~ 2% for these stars. For HD 38382 the false positive 
rate steadily increases as a function of log P from around 
6% at short periods, to ~ 25% at longer periods - again this 
is due to the sparser sampling of the HD 38382 data. 

4-4-4 Planet mass 

Finally, the false positive rate as a function of Mi (at all 
values of Pi and e^) is shown in Figure flUl While the large 
numbers of false positives at low mass might at first ap- 
pear troubling, it must be remembered that there is a sig- 
nificant selection effect against finding objects at very low 
meisses. Therefore the number of correct detections will de- 
cline steeply at very low masses (as low mass planets are very 
hard to detect), while the incorrect detection rate should 
remain roughly constant. This will lead to a systematic in- 
crease in the false positive rate at very low masses. 

We test this idea by showing the false positives as a 
function of the total number of simulations for each star in 
the top panel of Figure 1111 The incorrect detections seem 
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Figure 11. False positives as a function of total number of 
simulations for HD 179949 (diamonds), HD 20782 (squares) and 
HD 38382 (triangles) {top panel) and for the HD 179949 subsets 
{bottom panel): N = lA — dotted line; N = 28 - dashed line; 
N = 5d - solid line. 



to be approximately constant for HD 179949 and HD 20782, 
with values of ~l-2% for the former and ~2-3% for the 
latter. In the case of HD 38382, however, there is an in- 
crease from about 1.6Mjup up to around 20%. Comparing 
HD 38382 to the three subsets of HD 179949 (bottom panel 
of Figure llip , we see the cause of the increase is observation 
density. The subset with A'^ = 14 - approximately the same 
as HD 38382 - suffers from the same effect. The iV = 28 
subset shows the effect to a lesser degree and it is almost 
completely removed in the A'^ = 56 subset. This shows one 
weakness in our detection criteria and we are currently in- 
vestigating ways to minimise the problem. 



5 RESULTS 

Our automated detection criteria, and a means to analyse 
false positive rates, place us in a position to examine in 
detail the observational biases present in our data for the 
three stars under simulation. We define the detectability 
D' {Pi, Ci, Ali) as the detection rate as a function of Pi, Ci, 
and Aii - in the case of our simulations we performed 100 
realisations at each point in {Pi,ei,Mi) space, so we divide 
the number of detections by 100. False positives have been 
removed from the D'{Pi,ei, Mi) for each set of simulations. 
In Figure 1121 we show contour maps of detectability, for 
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Figure 12. Dctcctability of planets as a function of input period and input planet mass at 3 different values of eceentrieity (0.1, 0.6, 0.9) 
for HD 179949 (56 epochs, AT = 2626 d), HD 20782 (35 epochs, AT = 3119 d) and HD 38382 (17 epochs, AT = 2452 d). False positives 
have been subtracted from each star. The vertical dashed line in each panel indicates AT. 



each star, as a function of input period and planet mass for 
three different eccentricities (e = 0.1, 0.6, 0.9). Note that 
the HD 179949 map is for the full set of 56 observations. The 
vertical dashed line indicates the time-span of the observa- 
tions (AT). These detectability surfaces display a number 
of common features. 

First, it can be clearly seen that detectability is quite 
low at periods lon ger than th e time- span of the observations 
(as seen by both ICumminS l|2004 ) and IWittenmver et ahl 



(I2006I )'). and is also low at short periods (below ~2d). Both 
these effects are precisely what one would naively expect, 
based on our ability to sample the Doppler variability of our 
target stars. And both are reflected in the properties of the 
currently detected exoplanets. Few exoplanets are known at 
periods of 10 years or longer as a result of Doppler searches 
which are based on around a decade's worth of data. And, 
no Doppler exoplanet has been found at periods of less than 
~2 days without first being detected via a transit event. 
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Second, D'(Pi,ei, Mi) as a function of planet mass, for a 
given eccentricity, reveals the same general pattern for each 
star, with detectability decreasing with increasing period. 
This is not surprising, since (to first order) for data sets 
with similar Doppler measurement precision over time, the 
ability to detect an exoplanet is determined by the size of 
the semi-amplitude Km relative to that precision, which is 
in turn (from Equation [2} a function of Pm of the form 

M oc 1/VP. 

The combination of these two effects means that the 
general form of the D' {Pi,ei, Mi) surface is one of a tran- 
sition region (with slope Af oc 1/ ^/P) dividing highly de- 
tectable planets (or generally higher mass and shorter pe- 
riod) from undetectable ones (or lower mass and longer pe- 
riod), modified by a cut-on at short periods of ~ 2 days 
(determined by the shortest data sampling obtained), and a 
cut-off at long periods (determined by the length of the ob- 
servation string, AT). These are the general behaviours that 
one would expect. Of more interest is the detailed behaviour 
these surfaces reveal for each target. 

In particular, for example, we see that the steepness 
of the "transition region" between highly detectable, and 
mostly undetectable planets is a strong function of the num- 
ber of observations obtained. The slope in the transition re- 
gion is steep for HD 179949 and HD 20782, but shaUow for 
the more poorly sampled HD 38382 data. 

Moreover, the slope of the transition region is also a 
function of eccentricity, for all three stars, highly eccentric 
planets have a gently sloping transition region that mostly 
fills the entire available mass-period plane. Indeed, it is only 
at short periods that even the best sampled data sets have 
high detectability. 

To assist in the visualisation of a more detailed analysis 
of these general trends, we define the integrated detectabil- 
ity, Di'nti such that D{^f^{Pi) is simply the detectability at 
a given period Pi, over all and Mi (and by extension 
Di„t(Mi) and D[^f.{ei) axe defined as the detectability over 
all the other relevant parameters in each case). In partic- 
ular, we pay special attention to the three subsets of the 
HD 179949 data set (as described in Section ^ in order to 
examine the impact of data sampling. 




Figure 13. Fraction of simulated planets re-detected as a func- 
tion of input eccentricity, or D'^^^{ei), for HD 179949 with TV = 
14, 28, & 56 with false positives removed. This is over all periods 
and semi-amplitudes and shows the importance of data sampling 
and number of epochs for detecting highly eccentric planets. The 
points are connected to identify different trends for each star. 




Figure 14. Similar to Figure [T3l except for HD 20782, HD 38382 
and HD 179949. 



5.1 Eccentricity 

Figure [TH] shows the integrated detectability for each of the 
HD 179949 subsets as a function of eccentricity, I?int(ei)i 
recall that false positives have been removed. There is a 
clear difference at high eccentricity between each of the sub- 
samples. At N=14 (squares), Dint(ei) drops significantly at 
Bi = 0.5 and higher. At higher values of N (28 - triangles; 
56 - diamonds), the drop-off starts at higher eccentricity 
(ci > 0.6 for N=28 and > 0.7 for N=56). Below each of 
these values, D[^^{ei) i s approximately constant. As demon- 
strated elsewhere (e.g. ICummin J2004f ) . the implication here 
is that higher observation density makes detection of high 
eccentricity planets more likely. 

The subsets of HD 179949 show significant variations 
in i3i'nt(ei)) but is there a difference between stars? Figure 
[Til shows Z)j'„t(ei) for each of the three objects, HD 179949, 
HD 20782 and HD 38382. The shape of the curves, while in 
general decreasing at higher eccentricities, is different for 



each object. For example, when ei < 0.1 HD 179949 has the 
lowest fraction of planets redetected, however when d > 0.8 
it has the highest. Thus data sampling and quality are fun- 
damental to the selection effects present in planet search 
observations and a simple parametrisation of the detectabil- 
ity of exoplanet parame ters using "whole-of-survey" metrics 

- e.g. ICumming (|2004l ) - cannot be done. As an example, 
consider the case of HD 179949. Of the three sets of obser- 
vations we discuss here, the HD 179949 data have the high- 
est median measurement uncertainty (5.26 ms~^), and one 
might naively expect it's detectabilities to be the lowest. 
However, the observation density (equal to the observation 
time-span/number of observations or AT/N) is the highest 
at 47 days/epoch, which should counteract the first effect to 
some degree. It is not intuitively clear how to parametrise 
and compare the detectability of the HD 179949 observa- 
tions, with, for example, that of the HD 38382 observations 

- which have lower observation density but also lower me- 
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Figure 15. The difference between measured and input de- 
tectabilities as function of input eccentricity. There are fewer 
measured detections at = 0.0 and a small excess at other ec- 
centricities, peaking at = 0.1. 



dian measurement uncertainty - without the simulations we 
have carried out in this study. Therefore carrying out simu- 
lations on a star-by-star basis is the only way to understand 
the selection effects in Doppler velocity planet searches. 

5.2 A Bias against Zero Eccentricity Detections 

In the previous section, we examined D[^^{ei), the de- 
tectability at each e^, which is at all Pi and Mi. This is 
fine for the case of these simulations, where we know the in- 
put parameter values a priori. However, as this is never the 
case for actual Doppler planet data it is useful to consider 
our detectability at each e™; i.e. the measured eccentricities 
rather than the input eccentricities. We call this quantity 
Dj'nt(em). It is determined by counting the number of cor- 
rect detections - i.e. false positives are excluded - in equally 
spaced bins of , normalised by the number of simulations 
in each of bin. 

We now compare the two quantities, D[^^{em) and 
D(„t(ei) and show the difference between them for each of 
our three stars in Figure 1151 The striking feature is that 
when binned by measured eccentricity the detectability is 
lower by up to 15% at ei = than when binned by input 
eccentricity. At other eccentricities d, there is an opposite 
effect, albeit smaller. What could be causing these apparent 
biases, especially against finding zero eccentricity orbits? 

The answer can be seen in Figure [16] which shows the 
measured semi-amplitude. Km, as a function of the mea- 
sured eccentricity, e™, for HD 38382 where the input eccen- 
tricity is ei = 0.0. Also plotted is the median value of 6m over 
various ranges of Km (shown by the error bars in Km) and 
the median value of the fit error for over the same range 
of Km (shown by the error bars in Cm ). As semi-amplitude 
decreases the measured eccentricity (em) increases, i.e. the 
Keplerian fit to noisy data with a perfectly circular orbit 
(e = 0.0) tends towards a higher eccentricity. This has the 
effect of decreasing the number of = orbits and increas- 
ing the number of non-zero eccentricity orbits, in particular 
for the Em =0.1 bin of orbits (Figure [16}. At low values of 
K,n, up to one third of zero eccentricity orbits have best fits 
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Figure 16. Measured semi-amplitude Km as a function of mea- 
sured eccentricity em where = 0.0. Also plotted is the median 
value of Cm over various ranges of Km (shown by the error bars 
in Km) and the median value of the fit error for em over the same 
range of Km (shown by the error bars in Sm)- At low signal-to- 
noise ratios - and therefore low values of Km — there is a bias 
against measuring zero eccentricity orbits. 



that move them out of the e = bin. This occurs because 
the shape of the velocity curve is not well constrained in 
low signal-to-noise data, even if the orbital period and semi- 
amplitude are. For e; > 0.1 eccentricity orbits the distribu- 
tion of measured eccentricities is symmetric so the median 
of spread at low signal-to-noise converges to the real value 
and the effect is not observed. The distribution of measured 
eccentricities is non-symmetric when — > 0.0, however, so 
the median is always non-zero. Similar plots for the other 
two stars show the same effect to varying degrees. 

The conclusion from this analysis is that there is a small, 
but significant, bias against measuring an eccentricity of 
zero, especially at low signal-to-noise ratios. This bias has 
also been recently rep orted in an independent analysis by 
IShenfc Turned (|2008l ). 

5.3 Period 

We now turn our attention to D[^f.{Pi), which is the inte- 
grated detectability at a given input period. Pi. Figure [T71 
shows Dint (-Pi) for each subset of the HD 179949 data as a 
function of Pi. Perhaps unsurprisingly, a higher density of 
observations leads to a higher detectability of planets at all 
periods. For the 56 and 28 epoch data sets, D[^i{Pi) is a 
mostly linear function of log Pi , with a drop at Pi ~ 2 d and 
a large drop-off Pi <2d (as noted above). For the 14 epoch 
subset, however, D(nt(fO drops sharply at ^ 4 d, rather than 
at Pi < 2 d, reflecting the fact that the reduced data set does 
not sample short periods well. At periods longer than AT 
(=2626 days for HD 179949), A'nt(-Pi) drops off by almost 
a factor of two. 

We have seen the effects of sampling on the period de- 
tectability above, but what role does the data quality (indi- 
cated by the median measurement uncertainty in Table [Sj 
play, if any? To investigate this, we show D[^f.{Pi) for each 
of the three objects studied in Figure [18] Once again, we see 
a drop in detectability at periods below ~ 2 days and at pe- 
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Figure 17. Similar to Figure 1131 except as a function of input 
period. 



Figure 18. Similar to Figure 1141 except as a function of input 
period. 



riods longer than AT. While there is an offset between the 
A'nt(-Pi) for each of the HD 179949 subsets, caused by differ- 
ing observation density, there is no clear offset between the 
three different stars, despite the significantly different ob- 
servation densities (HD 179949 - 47 days/epoch; HD 20782 
- 89 days/epoch; HD 38382 - 144 days/epoch). If examine 
the median measurement uncertainties as a proxy for data 
quality, we see for example that HD 20782, which has the 
lowest measurement uncertainties at 2.27ms~^, has typi- 
cally higher period detectabilities, despite having lower ob- 
servation density than HD 179949. This suggests that once 
again a complicated combination of observation density and 
data quality are important in selection functions for Doppler 
planet search data. 

We also consider the detectability at each Pm , the mea- 
sured period, denoted D{^^{Pm), and compare it with the 
Dint{Pi) discussed above. We calculate D{^f^{Pm) by count- 
ing the number of correct detections in equally spaced bins 
of log Pm , normalised by the number of simulations in each 
of bin. At periods up to 1000 d (logP — 3.0), the difference 
in detectabilities are less than 0.5% for all stars. There is a 
small offset for the two longest periods of up to ±5%, which 
is caused by poorly constrained long periods (logP = 3.6) 
"leaking" into the next shorter period bin (logP — 3.3). 



ulations of each star we find variations (see Figure I20p . 
The HD 20782 observations, which have the highest qual- 
ity with a median measurement uncertainty of 2.27 ms~^, 
allow the detection of the lowest planet masses (after false 
positives are removed), as shown in Figure [201 Even though 
there are more observations of HD 179949, the median un- 
certainty of HD 20782 is less than half that star's value. In 
the case of HD 38382, the median uncertainty of the obser- 
vations and the number of epochs appear to be important, 
giving a slightly lower number of planets detected at inter- 
mediate masses (lMjup< Mi < 6Mjup) compared with the 
HD 179949. It is this complexity that shows the importance 
of a star-by-star analysis. 

Finally, we examine the detectability at Mm, the mea- 
sured planet mass, which we denote D[^^{Mm), and once 
again compare it with the corresponding detectability for 
input planet mass. The detectabilities are binned in log Mm, 
with the centroid of each bin set to the corresponding value 
of log Mi to allow comparison. The width of the bins was set 
to half the difference between the adjacent bins. Recall that 
Di'„t(Mi) is calculated by counting the number of detections 
at a given Mi, which assumes that the mass is known a pri- 
ori. We find the difference between D{^^{Mm) and D{^f.{AIi) 
is within ~3% for each star. We do not consider these dif- 
ferences to be significant. 



5.4 Planet mass 

The integrated detectability as a function of planet mass, 
_Dj'„t(Mi) is more complicated than the integrated de- 
tectability as a function of period or eccentricity, because 
planet mass is a function of both of these parameters (as 
well as semi- amplitude) through Equation [5] Figure 1191 
shows D(„t(Afi) as a function of planet mass for the three 
HD 179949 subsets. As one might naively expect, we see that 
more data results in higher detectabilities for a given mass of 
planet. (Recall also that at low masses false positives begin 
to have a significant impact on false positives for sparesly 
sampled data - they represent up to 20% as a fraction of 
the total detections at M<0.2Mjup, leading to an apparently 
higher detectability than data sets with more observations.) 
Once again, if we examine the Dl^^(Mi) curves for sim- 



6 SUMMARY 

We have begun a project to investigate the observational bi- 
ases inherent in Doppler velocity data, in particular in the 
Anglo- Australian Planet Search. An essential part of this 
study is the development of a set of tools that will allow the 
automatic detection of exoplanets. We present the 2D Kep- 
lerian Lomb-Scargle periodogram, a new algorithm based on 
an extension of the traditional Lomb-Scargle periodogram, 
which includes eccentricity. This is very efficient at detect- 
ing high eccentricity planets, which we highlight with a re- 
analysis of the extreme object HD 20782b. 

We have constructed Monte-Carlo-like simulations of 
Anglo- Australian Planet Search data that include the time- 
stamps of the observations and a simple noise model that in- 
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Figure 19. Similar to Figure 1131 except as a function of input 
planet mass. 
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Figure 20. Similar to Figure [14] except for each of the three stars 
studied. 



corporates into it the measurement uncertainties. The simu- 
lations cover the full range of physically important exoplanet 
orbital parameters: period, eccentricity and planet mass. As 
part of the simulation system, we have developed a set of 
detection criteria which can be applied to our simulated 
data sets in an automated fashion. We have investigated 
the false positives produced by these detection criteria and 
found them to be quantifiable and at an acceptably low level, 
which enables meaningful conclusions to be reached from our 
simulations. We also find that there is a bias against detect- 
ing zero-eccentricity orbits at low signal-to-noise ratios. 

Finally, we present preliminary results from simulations 
of velocity observations of three AAPS objects: HD 179949, 
HD 38382 and HD 20782. Our investigation shows that the 
detectability of exoplanet orbital parameters differs signif- 
icantly from object to object, meaning that the simple 
parametrisations invoked by previous studies are of limited 
validity. 
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